Goto

Collaborating Authors

 bridging planning and reinforcement learning


Search on the Replay Buffer: Bridging Planning and Reinforcement Learning

Neural Information Processing Systems

The history of learning for control has been an exciting back and forth between two broad classes of algorithms: planning and reinforcement learning. Planning algorithms effectively reason over long horizons, but assume access to a local policy and distance metric over collision-free paths. Reinforcement learning excels at learning policies and relative values of states, but fails to plan over long horizons. Despite the successes of each method on various tasks, long horizon, sparse reward tasks with high-dimensional observations remain exceedingly challenging for both planning and reinforcement learning algorithms. Frustratingly, these sorts of tasks are potentially the most useful, as they are simple to design (a human only need to provide an example goal state) and avoid injecting bias through reward shaping.


Search on the Replay Buffer: Bridging Planning and Reinforcement Learning

Neural Information Processing Systems

The history of learning for control has been an exciting back and forth between two broad classes of algorithms: planning and reinforcement learning. Planning algorithms effectively reason over long horizons, but assume access to a local policy and distance metric over collision-free paths. Reinforcement learning excels at learning policies and relative values of states, but fails to plan over long horizons. Despite the successes of each method on various tasks, long horizon, sparse reward tasks with high-dimensional observations remain exceedingly challenging for both planning and reinforcement learning algorithms. Frustratingly, these sorts of tasks are potentially the most useful, as they are simple to design (a human only need to provide an example goal state) and avoid injecting bias through reward shaping.


Reviews: Search on the Replay Buffer: Bridging Planning and Reinforcement Learning

Neural Information Processing Systems

Compact search spaces would confer computational benefits if nothing else. Overall, studying how compact representations of the state might might compare when used inside graph search seems like a nice way to evaluate just how much utility is added by the distributional RL component of the overall approach.


Reviews: Search on the Replay Buffer: Bridging Planning and Reinforcement Learning

Neural Information Processing Systems

The paper presents a general-purpose control algorithm combining planning and RL to solve tasks with sparse rewards or with long horizon. This algorithm is novel and interesting. The three reviewers agree that the contributions presented here should be published at the conference. The rebuttal helped solving most clarification issues. The reviewers also suggest various ways to further improve the manuscript.


Search on the Replay Buffer: Bridging Planning and Reinforcement Learning

Neural Information Processing Systems

The history of learning for control has been an exciting back and forth between two broad classes of algorithms: planning and reinforcement learning. Planning algorithms effectively reason over long horizons, but assume access to a local policy and distance metric over collision-free paths. Reinforcement learning excels at learning policies and relative values of states, but fails to plan over long horizons. Despite the successes of each method on various tasks, long horizon, sparse reward tasks with high-dimensional observations remain exceedingly challenging for both planning and reinforcement learning algorithms. Frustratingly, these sorts of tasks are potentially the most useful, as they are simple to design (a human only need to provide an example goal state) and avoid injecting bias through reward shaping.


Search on the Replay Buffer: Bridging Planning and Reinforcement Learning

Eysenbach, Ben, Salakhutdinov, Russ R., Levine, Sergey

Neural Information Processing Systems

The history of learning for control has been an exciting back and forth between two broad classes of algorithms: planning and reinforcement learning. Planning algorithms effectively reason over long horizons, but assume access to a local policy and distance metric over collision-free paths. Reinforcement learning excels at learning policies and relative values of states, but fails to plan over long horizons. Despite the successes of each method on various tasks, long horizon, sparse reward tasks with high-dimensional observations remain exceedingly challenging for both planning and reinforcement learning algorithms. Frustratingly, these sorts of tasks are potentially the most useful, as they are simple to design (a human only need to provide an example goal state) and avoid injecting bias through reward shaping.